PhD

The LaTeX sources of my Ph.D. thesis
git clone https://esimon.eu/repos/PhD.git
Log | Files | Refs | README | LICENSE

supervision.tex (11005B)


      1 \section{The Problem of Data Scarcity}
      2 \label{sec:relation extraction:supervision}
      3 Ideally, a labeled dataset should be available for the source language and target relation domain \(\relationSet\), but alas, this is rarely the case.
      4 In particular, the order of \(\relationSet\) can range in the thousands, in which case, accurate labeling is tedious for human operators.
      5 To circumvent this problem, alternative supervision strategies have been used.
      6 
      7 Despite the ubiquity of the terms, it is not easy to define the different forms of supervision clearly.
      8 We use the following practical definition: a dataset is supervised if among its features, one---the labels---must be predicted from the others.
      9 Furthermore, to distinguish with the self-supervised setup, we need to impose that the labels must be somewhat hard to obtain, typically through manual annotation.%
     10 \sidenote[][-30mm]{
     11 	To add to the confusion, the distinction between self-supervised and unsupervised is not necessarily pertinent, e.g.~Yann LeCun retired ``unsupervised'' from his vocabulary, replacing it with ``self-supervised'' \parencite{selfsupervised}.
     12 	In this case, the difficulty of obtaining the labels might be the sole difference between the ``unsupervised/self-supervised'' and ``supervised'' setups.
     13 }
     14 For our task at hand, a supervised dataset takes the form \(\dataSet_\relationSet\subseteq\sentenceSet\times\entitySet^2\times\relationSet\), indeed we seek to predict relation labels and obtaining those is tedious and error-prone.
     15 On the other hand, an unsupervised dataset takes the form \(\dataSet\subseteq\sentenceSet\times\entitySet^2\), which is much easier to obtain: vast amounts of text are now digitized and can be processed by an entity chunker and an entity linker.
     16 An intermediate supervision setting is semi-supervision when a small subset of samples are supervised while other are left unsupervised, which can be stated as \(\dataSet_\text{semi}\subseteq\sentenceSet\times\entitySet^2\times(\relationSet\cup\{\varepsilon\})\).%
     17 \sidenote[][-24mm]{
     18 	Here, we denote by \(\varepsilon\) the absence of labels for a sample since this is often reflected by an empty field.
     19 }
     20 
     21 Despite these different kinds of datasets on which a relation extraction model can be trained, evaluating such a model is nearly always done using a supervised dataset \(\dataSet_\relationSet\).
     22 In this section, we present two other approaches to train a model without manual labeling: bootstrap and distant supervision.
     23 
     24 \subsection{Bootstrap}
     25 \label{sec:relation extraction:bootstrap}
     26 \begin{marginalgorithm}[-5cm]
     27 	\input{mainmatter/relation extraction/bootstrap algorithm.tex}
     28 	\scaption[The bootstrap algorithm.]{
     29 		The bootstrap algorithm.
     30 		Occurrences are simply a set of samples \(O\subseteq\dataSet\) conveying the target relation.
     31 		The algorithm can be either seeded with a set of occurrences \(O\) \parencite{dipre} or a set of rules \(R\) \parencite{hearst_hyponyms}.
     32 		When starting with a set of occurrences, the algorithm must first start by extracting a set of rules, then alternate between finding occurrences and rules as listed.
     33 		\label{alg:relation extraction:bootstrap}
     34 	}
     35 \end{marginalgorithm}
     36 
     37 Another method to deal with the scarcity of data is to use bootstrap.
     38 Early approaches to relation extraction often focused on a single relation and fell into this category of bootstrapped methods.
     39 The bootstrap process (Algorithm~\ref{alg:relation extraction:bootstrap}) starts with a small amount of labeled data and finds extraction rules by generalizing to a large amount of unlabeled data.
     40 As such, it is a semi-supervised approach.
     41 We now describe this algorithm by following the work that pioneered this approach.
     42 
     43 \textcite{hearst_hyponyms} propose a method to detect a single relation between noun phrases: hyponymy.
     44 They define \(e_1\) to be an hyponym of \(e_2\) when the sentence ``An \(e_1\) is a (kind of) \(e_2\).'' is acceptable to an English speaker.
     45 This relation is then detected inside a corpora using lexico-syntactic patterns such as:%
     46 \sidenote{The syntax used here is inspired by regular expression: ``\texttt{()}'' are used for grouping, ``\texttt{?}'' indicates the previous atom is optional, ``\texttt{|}'' is used for alternatives and ``\texttt{*}'' is the Kleene star meaning zero or more repetitions.}
     47 \begin{indentedexample}
     48 	\raggedright
     49 	\(e_1\) ,\texttt{?} including \texttt{(\textrm{\(e_2\),})*} \texttt{(\textrm{or}|\textrm{and})?} \(e_3\)\\
     50 	\(\implies \tripletHolds{e_2}{\textsl{hyponym of\/}}{e_1}\)\\
     51 	\(\implies \tripletHolds{e_3}{\textsl{hyponym of\/}}{e_1}\)\\
     52 \end{indentedexample}
     53 where the entities \(e_i\) are constrained to be noun phrases.
     54 This rule matches on the following sentence:
     55 \begin{indentedexample}
     56 	\raggedright
     57 	All common-law countries, including Canada and England\dots\\
     58 	\(\implies \tripletHolds{\text{Canada}}{\textsl{hyponym of\/}}{\text{Common-law country}}\)\\
     59 	\(\implies \tripletHolds{\text{England}}{\textsl{hyponym of\/}}{\text{Common-law country}}\)\\
     60 \end{indentedexample}
     61 
     62 \Textcitex{hearst_hyponyms} proposes the following process: start with known facts such as \(\operatorname{hyponym}(\text{England}, \text{Country})\), find all places where the two entities co-occur in the corpus and write new rules from the patterns observed, which allows them to discover new facts to repeat the process with.
     63 Beside some basic lemmatization---which explains why ``countries'' became ``country'' in the example above---all noun phrases are treated as possible entities.
     64 This is sensible since the end goal of the approach is to generate new facts for the WordNet knowledge base.
     65 In \textcite{hearst_hyponyms}, writing new rules was not done automatically but performed manually.
     66 
     67 Following equation~\ref{eq:relation extraction:sentential definition}, a sentential relation extraction system usually defines a relation \(r\) as a subset of \(\sentenceSet\times\entitySet\times\entitySet\), i.e.\ relations are conveyed jointly by sentences and entity pairs.
     68 In contrast, \textcite{hearst_hyponyms} makes the following assumption:
     69 \begin{marginparagraph}
     70 	The assumption of \textcite{hearst_hyponyms} is that there are two morphisms \(\sentenceSet\to\relationSet\) and \(\entitySet^2\to\relationSet\), therefore \(\dataSet\) must have a form which makes this decomposition possible: \((s, \vctr{e})\in\dataSet\) if and only if \(s\) and \(\vctr{e}\) are mapped to the same relation.
     71 	In other words, \(\dataSet\) completes the two relation extraction morphisms to a commutative square:
     72 	\begin{center}
     73 		\input{mainmatter/relation extraction/pullback.tex}
     74 	\end{center}
     75 	In category theory, this object is called a pullback and noted \(\times_\relationSet\).
     76 	This also means that given a sample from \(\dataSet\), it is possible to find its relation without looking at its sentence or its entities since either of them is sufficient.
     77 \end{marginparagraph}
     78 \begin{assumption}{pullback}
     79 	It is possible to find the relation conveyed by a sample by looking at the entities alone and ignoring the sentence; and conversely by looking at the sentence alone and ignoring the entities.
     80 
     81 	\smallskip
     82 	\noindent
     83 	\(\dataSet = \sentenceSet\times_\relationSet\entitySet^2.\)
     84 \end{assumption}
     85 This implies that given a pair of entities, whatever is the sentence in which they appear, the conveyed relation is the same.
     86 On the contrary, given a sentence, the conveyed relation is always the same, whatever the entities.
     87 As such the representation of a relation is split into two parts:
     88 \begin{description}
     89 	\item[a set of entity pairs] \(r_\entitySet \subseteq \entitySet^2\), which can be represented exactly;
     90 	\item[a set of sentences] \(r_\sentenceSet \subseteq \sentenceSet\), which in \textcite{hearst_hyponyms} was represented by a set of patterns matching only sentences in \(r_\sentenceSet\), such as ``\(e_1\) ,\texttt{?} including \texttt{(\textrm{\(e_2\),})*} \texttt{(\textrm{or}|\textrm{and})?} \(e_3\).''
     91 \end{description}
     92 Given a dataset \(\dataSet\subseteq\sentenceSet\times\entitySet^2\), it is possible to map from \(r_\entitySet\) to \(r_\sentenceSet\) by taking all sentences where the two entities appear and vice-versa by taking all pairs of entities appearing in the given sentences.
     93 The second process \(\relationSet_\sentenceSet\times \dataSet\to \relationSet_\entitySet\) is straightforward to implement exhaustively.
     94 While the first process \(\relationSet_\entitySet\times \dataSet\to \relationSet_\sentenceSet\) was performed manually by \textcite{hearst_hyponyms}.
     95 
     96 \subsection{Distant Supervision}
     97 \label{sec:relation extraction:distant supervision}
     98 \textcitex{distant_early}[-5mm] introduced the idea of weak supervision to relation extraction as a compromise between hand labeled dataset and unsupervised training.
     99 It was then popularized by \textcitex{distant} under the name \emph{distant supervision}.
    100 Their idea is to use a knowledge base \(\kbSet\subseteq\entitySet^2\times\relationSet\) to supervise an unsupervised dataset \(\dataSet\).
    101 The underlying assumption can be stated as:
    102 \begin{marginparagraph}
    103 	The use of assumptions or modeling hypotheses noted \hypothesis{name} is central to several relation extraction models, especially unsupervised ones.
    104 	We strongly encourage the reader to look at the list of assumptions in Appendix~\ref{chap:assumptions}.
    105 	The appendix provides counter-examples when appropriate.
    106 	Furthermore, it lists the sections in which each assumption was introduced for reference.
    107 \end{marginparagraph}
    108 \begin{assumption}{distant}
    109 	A sentence conveys all the possible relations between all the entities it contains.
    110 
    111 	\smallskip
    112 	\noindent
    113 	\(\dataSet_\relationSet = \dataSet \bowtie \kbSet\)
    114 
    115 	\smallskip
    116 	\noindent
    117 	where \(\bowtie\) denotes the natural join operator:
    118 	\begin{equation*}
    119 		\dataSet \bowtie \kbSet =
    120 			\left\{\,
    121 				(s, e_1, e_2, r)
    122 				\mid
    123 				(s, e_1, e_2)\in\dataSet
    124 				\land
    125 				(e_1, e_2, r)\in\kbSet
    126 			\,\right\}.
    127 	\end{equation*}
    128 \end{assumption}
    129 In other words, each sentence \((s, e_1, e_2)\in\dataSet\) is labeled by all relations \(r\) present between \(e_1\) and \(e_2\) in the knowledge base \(\kbSet\).
    130 This is sometimes referred to as an unaligned dataset, since sentences are not aligned with their corresponding facts.
    131 The assumption \hypothesis{distant} is quite obviously false, and is only used to build a supervised dataset.
    132 A classifier is then trained on this dataset.
    133 In most works, including the one of \textcite{distant}, the model is designed to handle the vast amount of false positive in \(\dataSet\bowtie\kbSet\), usually through the aggregate extraction setting (see Section~\ref{sec:relation extraction:definition}).
    134 
    135 A caveat of distantly supervised datasets is that evaluation is often complex.
    136 \Textcite{distant} evaluate their approach on Freebase (Section~\ref{sec:datasets:freebase}) by holding-out part of the knowledge base.
    137 However, the number of false negatives forces them to manually label the facts as true or false themselves.